Welcome to tutorial page of INEA-Batch-2 Group B Project.

Group-B will be working on the SDM (Species Distribution Modeling) of some target species such as leopard, or key species in Alaungdaw Kathapa National Park. But in this tutorial we will use camera trap data from the Htamanthi Wildlife Sanctuary.

In this tutorial, we’ll cover:

  1. What is species distribution modeling and what it is useful for,
  2. Explore with the various R package
  3. Plotting and Mapping to explore the habitat of wildlife

Background

The course gives a brief overview of the concept of species distribution modelling, and introduces the main modelling steps. Codes and data largely follow the materials from Zurell and Engler (2019) although we will use a different case study.

Species distribution models (SDMs) are a popular tool in quantitative ecology (Franklin 2010; Peterson et al. 2011; Guisan, Thuiller, and Zimmermann 2017) and constitute the most widely used modelling framework in global change impact assessments for projecting potential future range shifts of species (IPBES 2016). There are several reasons that make them so popular: they are comparably easy to use because many software packages (e.g. Thuiller et al. 2009; Phillips, Anderson, and Schapire 2006) and guidelines (e.g. Elith, Leathwick, and Hastie 2008; Elith et al. 2011; Merow, Smith, and Silander Jr 2013; Guisan, Thuiller, and Zimmermann 2017) are available, and they have comparably low data requirements.

Step by Steps

  1. Data Preparation
  2. Modeling
  3. Validation
  4. Mapping
  5. Plotting

Load the data

So, let’s get started in R. As mentioned previously, data used in this tutorial are leopard presence locations in Htamanthi Wildlife Sanctuary.

First of all, we need to load the required packages. If you are not installed them yet, please install them, first.

library(reshape2)         #for re-formatting data; version 1.4.3 used
## Warning: package 'reshape2' was built under R version 4.4.2
library(mgcv)             #for gams; version 1.8-24 used
## Loading required package: nlme
## This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
library(dismo)            #for SDMs; version 1.1-4 used
## Warning: package 'dismo' was built under R version 4.4.2
## Loading required package: raster
## Warning: package 'raster' was built under R version 4.4.2
## Loading required package: sp
## Warning: package 'sp' was built under R version 4.4.1
## 
## Attaching package: 'raster'
## The following object is masked from 'package:nlme':
## 
##     getData
library(rJava)            #for calling maxent from dismo (need Java installed); version 0.9-10 used
library(randomForest)     #for random forest SDMs; version 4.6-14 used
## Warning: package 'randomForest' was built under R version 4.4.2
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
library(maxnet)           #maxent with maxnet; version 0.1.2 used
## Warning: package 'maxnet' was built under R version 4.4.2
library(glmnet)           #needed for maxnet; version 2.0-16 used
## Warning: package 'glmnet' was built under R version 4.4.2
## Loading required package: Matrix
## Loaded glmnet 4.1-8
library(MuMIn)            #for model selection; version 1.42.1 used
## Warning: package 'MuMIn' was built under R version 4.4.2
## 
## Attaching package: 'MuMIn'
## The following object is masked from 'package:randomForest':
## 
##     importance
library(PresenceAbsence)  #for model evaluation; version 1.1.9 used
library(ecospat)         #for model evaluation; version 3.0 used
## Warning: package 'ecospat' was built under R version 4.4.2
require(here)  
## Loading required package: here
## Warning: package 'here' was built under R version 4.4.1
## here() starts at C:/git/leopard_distribution

Step:1: Data Preparation

In this step, the actual biodiversity and environmental data are gathered and processed. This concerns all data that are required for model fitting but also data that are used for making transfers. Special attention should be put on any scaling mismatches, meaning cases where the spatial (or temporal) grain or extent doe not match between biodiversity and environmental data or within environmental data. In these cases, we need to make decisions about adequate upscaling and downscaling strategies. Another important issue is the form of absence information available for the biodiversity data. Most SDM applications deal with some form of presence information for the species. These can be direct observations, GPS locations of data loggers, or museum records among others. All SDM algorithms require some form of absence or background information that they use as contrast to the presence data. Yet, absence data are rarely available. In such cases, adequate background data or pseudo-absence data needs to be selected. Again, the best strategy will depend on the research question, the data and the SDM algorithm (Guisan, Thuiller, and Zimmermann 2017). Finally, for later model assessment we may wish to partition the data into training data and validation data (Hastie, Tibshirani, and Friedman 2009).

1.1 Load dataset and variables

lp.data <- read.csv(here("data", "lp_location_raw_v04.csv"))
lp.val <- read.csv(here("data", "lp_location_valid_v04.csv"))

#subset to presence-only / absence-only
lp.pres <- lp.data[lp.data$IDP_event>=1,]
lp.abs <- lp.data[lp.data$IDP_event==0,]
lp.pres.xy <- as.matrix(lp.pres[,c("longitude", "latitude")])
lp.abs.xy <- as.matrix(lp.abs[,c("longitude", "latitude")])

#validation data
lp.val.pres <- as.matrix(lp.val[lp.val$IDP_event>=1, c("longitude", "latitude")])
lp.val.abs <- as.matrix(lp.val[lp.val$IDP_event==0, c("longitude", "latitude")])
lp.val.xy <- as.matrix(lp.val[,c("longitude", "latitude")])

1.2 Covariates

Here, we will use elevation, NDVI, slope, and other distance related covarites to be included in our modeling process.

#covariate maps
elev <- raster(here("data", "newtif","elev.tif"))        #elevation layer
ndvi <- raster(here(
  "data","newtif","ndvi.tif"))          #Normalised Different Vegetation Index
slope <- raster(here(
  "data","newtif","slp.tif"))         #Degree of Slope
ndvi <- raster(here(
  "data","newtif","ndvi.tif"))           #Land Surface Temperature

dist_rd <- raster(here("data","newtif","distance_rasters", "d_rd.tif"))       # Distance to Road
dist_water <- raster(here("data","newtif","distance_rasters","d_water.tif"))    # Distance to Water
dist_patrol <- raster(here("data","newtif","distance_rasters", "d_patrol.tif"))   # Distance to Patrol Stations
dist_vlg <- raster(here("data","newtif","distance_rasters","d_vlg.tif"))  # Distance to Villages


canopy_h <- raster(here("data","newtif", "canopy_h_bdy.tif"))

bio1 <- raster(here("data", "newtif", "bio1.tif"))

bio2 <- raster(here("data", "newtif", "bio1.tif"))

check coordinates

proj4string(elev)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
proj4string(ndvi)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
proj4string(slope)
## [1] "+proj=utm +zone=47 +datum=WGS84 +units=m +no_defs"
proj4string(bio1)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
proj4string(bio2)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
proj4string(canopy_h)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
proj4string(dist_rd)
## [1] "+proj=utm +zone=46 +datum=WGS84 +units=m +no_defs"
proj4string(dist_patrol)
## [1] "+proj=utm +zone=46 +datum=WGS84 +units=m +no_defs"
proj4string(dist_vlg)
## [1] "+proj=utm +zone=46 +datum=WGS84 +units=m +no_defs"
proj4string(dist_water)
## [1] "+proj=utm +zone=46 +datum=WGS84 +units=m +no_defs"

Make Same coordinate system

ndvi = projectRaster(ndvi, crs = elev)
slope= projectRaster(slope, crs=elev)
bio1= projectRaster(bio1, crs=elev)
bio2= projectRaster(bio2, crs=elev)
canopy_h= projectRaster(canopy_h, crs=elev)
dist_rd= projectRaster(dist_rd, crs=elev)
dist_patrol= projectRaster(dist_patrol, crs=elev)
dist_vlg= projectRaster(dist_vlg, crs=elev)
dist_water= projectRaster(dist_water, crs=elev)
elev <- resample(x=elev, y= dist_water, "bilinear")
ndvi <- resample(x=ndvi, y= dist_water, "bilinear")
slope <- resample(slope, dist_water, "bilinear")
bio1<- resample(bio1,dist_water, "bilinear") 
bio2<- resample(bio2,dist_water, "bilinear")
canopy_h <- resample(canopy_h, dist_water, "bilinear")
dist_rd <- resample(dist_rd, dist_water, "bilinear")
dist_patrol <- resample(dist_patrol, dist_water, "bilinear")
dist_vlg <- resample(dist_vlg, dist_water, "bilinear")
compareRaster(elev,ndvi,slope, canopy_h, bio1, bio2, dist_patrol,dist_rd,dist_vlg,dist_water)
## [1] TRUE
layers <- stack(elev,ndvi,slope,canopy_h, bio1, bio2, dist_patrol,dist_rd,dist_vlg,dist_water)
names(layers) <- c("elev","ndvi","slope","canopy_h", "bio1", "bio2", "dist_patrol","dist_rd","dist_vlg","dist_water")
plot(layers)

pairs(layers)
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter
## Warning in graphics::par(usr): argument 1 does not name a graphical parameter

1.3 Generate availability/background points using dismo

back.xy <- randomPoints(layers, p=lp.pres.xy, n=100)

#inspect
head(back.xy)
##             x        y
## [1,] 95.67196 25.51195
## [2,] 95.73156 25.60138
## [3,] 95.29589 25.28594
## [4,] 95.60789 25.49732
## [5,] 95.93122 25.58106
## [6,] 95.52714 25.47591
#re-name columns
colnames(back.xy) <- c("longitude","latitude")

#plot
plot(elev)
points(back.xy)

1.4 extract GIS data

pres.cov <- extract(layers, lp.pres.xy)          #extracts values from layers at pres locations
back.cov <- extract(layers, back.xy)               #extracts values from layers at random locations
val.cov <- extract(layers, lp.val.xy)            #extracts values from layers at validation locations

2.1 GLMs

Let’s start with fitting a simple GLM. We can fit linear, quadratic or higher polynomial terms (check poly()) and interactions between predictors: - the term I()indicates that a variable should be transformed before being used as predictor in the formula - poly(x,n) creates a polynomial of degree n : x+x2+…+xn

  • x1:x2 creates a two-way interaction term between variables x1 and x2, the linear terms of x1 and x2 would have to be specified separately
  • x1*x2 creates a two-way interaction term between variables x1 and x2 plus their linear terms
  • x1x2x3 creates the linear terms of the three variables, all possible two-way interactions between these variables and the three-way interaction.
glm.lp <- glm(pres~ elev+ndvi
              +slope+canopy_h+dist_patrol+dist_rd+dist_vlg+dist_water, family=binomial(link=logit), data=all.cov)

#inspect
summary(glm.lp)
## 
## Call:
## glm(formula = pres ~ elev + ndvi + slope + canopy_h + dist_patrol + 
##     dist_rd + dist_vlg + dist_water, family = binomial(link = logit), 
##     data = all.cov)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.718e+00  6.196e+00  -1.084 0.278282    
## elev         2.252e-02  5.882e-03   3.829 0.000129 ***
## ndvi         4.129e+00  7.942e+00   0.520 0.603135    
## slope       -7.353e-02  6.886e-02  -1.068 0.285638    
## canopy_h     1.907e-02  5.297e-02   0.360 0.718876    
## dist_patrol -1.851e-04  4.465e-05  -4.145 3.39e-05 ***
## dist_rd     -2.761e-04  6.051e-05  -4.562 5.07e-06 ***
## dist_vlg     7.763e-05  5.973e-05   1.300 0.193691    
## dist_water  -3.069e-04  3.034e-04  -1.012 0.311743    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 221.26  on 164  degrees of freedom
## Residual deviance: 137.15  on 156  degrees of freedom
## AIC: 155.15
## 
## Number of Fisher Scoring iterations: 5
#mapping
glm.map <- predict(layers, glm.lp, type="response")

#plot
plot(glm.map, axes=F, box=F, main="GLM")
points(lp.pres.xy)

plot(glm.lp) # check the model fitness

2.2 Random Forests

Random forests use a bagging procedure for averaging the outputs of many different CARTs (classification and regression trees)(Liaw and Wiener 2002). Bagging stands for “bootstrap aggregation”. Basically, we fit many CARTs to bootstrapped samples of the training data and then either average the results in case of regression trees or make a simple vote in case of classification trees (committee averaging)(Hastie, Tibshirani, and Friedman 2009; Guisan, Thuiller, and Zimmermann 2017). An important feature of random forests are the out-of-bag samples, which means that the prediction/fit for a specific data point is only derived from averaging trees that did not include this data point during tree growing. Thus, the output of Random Forests is essentially cross-validated. Random forests estimate variable importance by a permutation procedure, which measures for each variable the drop in mean accuracy when this variable is permutated.

#random forest model (default)
rf.lp <- randomForest(as.factor(pres) ~ elev+ndvi
              +slope+canopy_h+dist_patrol+dist_rd+dist_vlg+dist_water, na.action=na.omit, data=all.cov)

#tuning model
rf.lp.tune <- tuneRF(y=as.factor(all.cov$pres), x = all.cov[,c(3:6)], stepFactor=0.5, ntreeTry=500)
## mtry = 2  OOB error = 32.12% 
## Searching left ...
## mtry = 4     OOB error = 33.94% 
## -0.05660377 0.05 
## Searching right ...
## mtry = 1     OOB error = 33.94% 
## -0.05660377 0.05

#update rf model with mtry=1 based on tuning
rf.lp <- randomForest(as.factor(pres) ~ elev+ndvi
              +slope+canopy_h+dist_patrol+dist_rd+dist_vlg+dist_water, mtry=1, ntree=500, na.action=na.omit, data=all.cov)

#variable importance plot
varImpPlot(rf.lp)

#mapping
rf.map <- predict(layers, rf.lp, type="prob",index=2)

#plot
plot(rf.map, axes=F, box=F, main="RF")
points(lp.pres.xy)

##2.3 Maxent

MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors.

for Maxent to run, place the maxent.jar file in the following directory:

system.file("java",package="dismo")
## [1] "C:/Users/minhe/AppData/Local/R/win-library/4.4/dismo/java"

Maxent model (default)

max.lp <- maxent(layers, p=lp.pres.xy)
summary(max.lp)
## Length  Class   Mode 
##      1 MaxEnt     S4

Provide background points

max.lp <- maxent(layers, p=lp.pres.xy, a=back.xy)

Tuning a maxent model

maxent.beta.3 <- maxent(layers, p=lp.pres.xy, a=back.xy,
                        args=c("betamultiplier=0.3"))
maxent.beta3 <- maxent(layers, p=lp.pres.xy, a=back.xy,
                       args=c("betamultiplier=3"))
maxent.features <- maxent(layers, p=lp.pres.xy, a=back.xy,
                          args=c("noproduct", "nohinge","nothreshold","noautofeature"))

Evaluate models

eval.max <- evaluate(p=lp.val.pres, a=lp.val.abs, max.lp, layers)
eval.max3 <- evaluate(p=lp.val.pres, a=lp.val.abs, maxent.beta3, layers)
eval.maxfeatures <- evaluate(p=lp.val.pres, a=lp.val.abs, maxent.features, layers)

inspect

eval.max
## class          : ModelEvaluation 
## n presences    : 12 
## n absences     : 16 
## AUC            : 1 
## cor            : 0.9652173 
## max TPR+TNR at : 0.536247
eval.max3
## class          : ModelEvaluation 
## n presences    : 12 
## n absences     : 16 
## AUC            : 1 
## cor            : 0.9691336 
## max TPR+TNR at : 0.6256913
eval.maxfeatures
## class          : ModelEvaluation 
## n presences    : 12 
## n absences     : 16 
## AUC            : 1 
## cor            : 0.9824327 
## max TPR+TNR at : 0.6341303

plot

response(max.lp, expand=0)

response(maxent.beta.3, expand=0)

response(maxent.beta3, expand=0)

response(maxent.features, expand=0)

mapping

max.map <- predict(layers, max.lp)

### plot
plot(max.map, axes=F, box=F, main="Maxent")

mapping with raw output (ROR)

max.raw.map <- predict(layers, max.lp, args="outputformat=raw")

### plot
plot(max.raw.map, axes=F, box=F, main="Maxent-raw")

cellStats(max.raw.map, mean)
## [1] 0.002532264

#Step:3: K-fold validation

In the model assessment step, we analyse the fitted model in depth. Strictly, checking the plausibility of the fitted species-environment relationship by visual inspection of response curves, and by assessing model coefficients and variable importance would also be part of the model assessment. However, to better understand what the different model algorithms were doing, we already explored this step during model fitting. Another crucial aspect of model assessment, which we will look at in more detail here, is assessing the predictive performance for a set of validation or test data (Hastie, Tibshirani, and Friedman 2009).

Next, we assess k-fold validation model performance. We inspect different measures: AUC, the area under the receiver operating characteristic (ROC) curve (Hosmer and Lemeshow 2013); TSS, the true skill statistic (Allouche, Tsoar, and Kadmon 2006); sensitivity, the true positive rate; and specificity, the true negative rate. Simultaneously, we estimate the optimal threshold for making binary predictions. For this, we use a threshold that maximises TSS (= maximises the sum of sensitivity and specificity) (Liu et al. 2005).

3.1 nmodel

require(glmnet)

3.2 summary table for cross-validation with existing data

summary.eval.kfold <- data.frame(matrix(nrow=0, ncol=11))
names(summary.eval.kfold) <- c("model", "k", "auc", "corr", "ll", "boyce",
                               "threshold", "sens", "spec", "tss", "kappa")
folds <- 5 #number of k-folds considered

3.3 create k-folds

kfold_pres <- kfold(pres.cov, k=folds)
kfold_back <- kfold(back.cov, k=folds)

for(k in 1:folds){
  
  #partition data into folds
  kfold <- k
  val.pres.k <- pres.cov[kfold_pres == kfold, ]
  val.back.k <- back.cov[kfold_back == kfold, ]
  val.k <- rbind(val.pres.k, val.back.k)
  val.k.cov <- val.k[,cbind("elev","ndvi","slope","canopy_h", "bio1", "bio2", "dist_patrol","dist_rd","dist_vlg","dist_water")]
  
  train.pres.k <- pres.cov[kfold_pres != kfold, ]
  train.back.k <- back.cov[kfold_back != kfold, ]
  train.k <- rbind(train.pres.k, train.back.k)
  
  #models fit to fold
  glm.k <- glm(pres~elev+ndvi
              +slope+canopy_h+dist_patrol+dist_rd+dist_vlg+dist_water, 
               family=binomial(link=logit), data=train.k)
  
  
  rf.k <- randomForest(as.factor(pres) ~ elev+ndvi
              +slope+canopy_h+dist_patrol+dist_rd+dist_vlg+dist_water, data=train.k, importance=F, ntree=500, mtry=1, na.action=na.omit)
 
  max.k <- maxent(layers, p=train.pres.k[,1:2], a=train.back.k[,1:2])
  
  #predictions for evaluation
  glm.val <- predict(glm.k,val.k.cov, type="response")
  rf.val <- predict(rf.k,val.k.cov, type="prob")
  rf.val <- rf.val[,2]
  max.val <- predict(max.k, val.k.cov)

  
  #evaluate model on fold
  val.data <- data.frame(siteID=1:nrow(val.k), obs=val.k$pres,
                        glm=glm.val, rf=rf.val, max=max.val)
  
  for(i in 1:nmodels){
    
    #calculate metrics for fold
    auc.i <- auc(val.data, which.model=i)
    kappa.opt <- optimal.thresholds(val.data, which.model=i, opt.methods=3)
    sens.i <- sensitivity(cmx(val.data, which.model=i,threshold = kappa.opt[[2]]))
    spec.i <- specificity(cmx(val.data, which.model=i,threshold = kappa.opt[[2]]))
    tss.i<- sens.i$sensitivity +spec.i$specificity - 1
    kappa.i <- Kappa(cmx(val.data, which.model=i,threshold = kappa.opt[[2]]))
    corr.i<-cor.test(val.data[,2],val.data[,i+2])$estimate
    ll.i <- sum(log(val.data[,i+2]*val.data[,2] + (1-val.data[,i+2])*(1-val.data[,2])))
    ll.i <- ifelse(ll.i=="-Inf", sum(log(val.data[,i+2]+0.001)*val.data[,2] + log((1-val.data[,i+2]))*(1-val.data[,2])),ll.i)
    boyce.i <- ecospat.boyce(fit=val.data[,i+2],obs=val.data[1:nrow(val.pres.k),i+2],res=100,PEplot = F)
    
    #summarize
    summary.i <- c(i,k,auc.i$AUC,corr.i,ll.i, boyce.i$Spearman.cor, kappa.opt[[2]],sens.i$sensitivity,spec.i$specificity,tss.i,kappa.i[[1]])
    summary.eval.kfold <- rbind(summary.eval.kfold, summary.i)
  }
  print(k)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
names(summary.eval.kfold) <- c("model", "k", "auc", "corr", "ll", "boyce",
                               "threshold", "sens", "spec", "tss")

inspect

require(plyr)
## Loading required package: plyr
## Warning: package 'plyr' was built under R version 4.4.1
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:here':
## 
##     here
summary.eval.kfold
##    model k       auc      corr        ll boyce threshold sens      spec
## 1      1 1 0.9076923 0.7484967 -13.29936 0.510 1.0000000 0.80 0.8000000
## 2      2 1 1.0000000 0.8858886 -10.20160 0.555 1.0000000 1.00 1.0000000
## 3      3 1 0.9461538 0.7784785 -11.82644 0.540 1.0000000 0.85 0.8500000
## 4      1 2 0.8500000 0.5966658 -16.02960 0.675 0.6923077 0.90 0.5923077
## 5      2 2 0.9192308 0.7542439 -12.34299 0.490 0.9230769 0.95 0.8730769
## 6      3 2 0.9115385 0.7140280 -13.56033 0.490 0.7692308 0.95 0.7192308
## 7      1 3 0.7384615 0.3772249 -22.45122 0.575 0.6153846 0.80 0.4153846
## 8      2 3 0.8480769 0.6390443 -15.23593 0.305 0.8461538 0.75 0.5961538
## 9      3 3 0.8230769 0.5768042 -15.84010 0.490 0.6153846 0.90 0.5153846
## 10     1 4 0.7923077 0.5215505 -20.03856 0.260 0.5384615 0.90 0.4384615
## 11     2 4 0.8692308 0.6540369 -15.12457 0.380 0.6923077 1.00 0.6923077
## 12     3 4 0.8538462 0.6570615 -15.19764 0.275 0.6923077 0.95 0.6423077
## 13     1 5 0.8692308 0.6621747 -14.17743 0.775 0.7692308 1.00 0.7692308
## 14     2 5 0.9673077 0.8334597 -12.14316 0.480 0.8461538 1.00 0.8461538
## 15     3 5 0.9346154 0.7773976 -12.32909 0.520 0.8461538 0.95 0.7961538
##          tss
## 1  0.7591241
## 2  1.0000000
## 3  0.8170055
## 4  0.6086957
## 5  0.8730769
## 6  0.7391304
## 7  0.4210526
## 8  0.5730129
## 9  0.5370741
## 10 0.4634146
## 11 0.7317073
## 12 0.6693387
## 13 0.8016032
## 14 0.8695652
## 15 0.8070175
#average across folds
(summary.eval.kfold.round <- round(ddply(summary.eval.kfold, .(model), summarise,
            auc = mean(auc),
            cor = mean(corr),
            ll = mean(ll),
            boyce = mean(boyce),
            threshold = mean(threshold),
            tss = mean(tss),
            kappa = mean(kappa)
),3))
## Warning in mean.default(kappa): argument is not numeric or logical: returning
## NA
## Warning in mean.default(kappa): argument is not numeric or logical: returning
## NA
## Warning in mean.default(kappa): argument is not numeric or logical: returning
## NA
##   model   auc   cor      ll boyce threshold   tss kappa
## 1     1 0.832 0.581 -17.199 0.559     0.723 0.611    NA
## 2     2 0.921 0.753 -13.010 0.442     0.862 0.809    NA
## 3     3 0.894 0.701 -13.751 0.463     0.785 0.714    NA
summary.eval.kfold.round$model <- c("glm", "rf", "max")

#Step:4: Ensembles

We can also combine predictions from the two SDM algorithms and make an ensemble prediction, for example by taking the median.

4.1 weighted average based on AUC

create a raster stack from predictions

models <- stack(glm.map, rf.map, max.map)
names(models) <- c("glm","rf", "max")

from kfold validation sampling

AUC.rf <- summary.eval.kfold.round[summary.eval.kfold.round$model=="rf", "auc"]
AUC.glm <- summary.eval.kfold.round[summary.eval.kfold.round$model=="glm", "auc"]
AUC.max <- summary.eval.kfold.round[summary.eval.kfold.round$model=="max", "auc"]

4.2 AUC weighted map

auc.weight <- c(AUC.glm, AUC.rf, AUC.max)

#AUC-based ensemble
ensemble.auc <- weighted.mean(models, auc.weight)

#plot
plot(ensemble.auc)

4.3 Frequency ensemble from binary maps

#Get thresholds identified in PresenceAbsence
thres.rf <- summary.eval.kfold.round[summary.eval.kfold.round$model=="rf", "threshold"]
thres.glm <- summary.eval.kfold.round[summary.eval.kfold.round$model=="glm", "threshold"]
thres.max <- summary.eval.kfold.round[summary.eval.kfold.round$model=="max", "threshold"]

#Create binary maps
rf.thres.map <- rf.map
glm.thres.map <- glm.map
max.thres.map <- max.map

values(rf.thres.map) <- 0
values(glm.thres.map) <- 0
values(max.thres.map) <- 0

rf.thres.map[rf.map > thres.rf] <- 1
glm.thres.map[glm.map > thres.glm] <- 1
max.thres.map[max.map > thres.max] <- 1

#plot
plot(rf.thres.map)

plot(glm.thres.map)

plot(max.thres.map)

#Ensemble mapping based on frequency
ensemble.freq <- rf.thres.map + glm.thres.map + max.thres.map

#plot
plot(ensemble.freq)
points(lp.pres.xy)
points(lp.val.pres)

Step5: Mapping the species distribution

Now that we carefully fitted the SDMs, inspected model and extrapolation behaviour, and assessed predictive performance, it is finally time to make predictions in space and time. Importance points to consider here are quantification of uncertainty due to input data, algorithms, model complexity and boundary conditions (e.g. climate scenarios)(Araújo et al. 2019; Thuiller et al. 2019). When transferring the model to a different geographic area or time period, it is also recommended to quantify uncertainty due to novel environments (Zurell, Elith, and Schroeder 2012).

par(mfrow = c(2,2))
plot(glm.map, main = "Prediction of Leopard Distribution by GLM")
plot(rf.map, main = "Prediction of Leopard Distribution by Random Forest")
plot(max.map, main = "Prediction of Leopard Distribution by MaxEnt")
plot(ensemble.auc, main = "Prediction of Leopard Distribution \n Average weighted model by both GLM, Random Forest and MaxEnt")

{plot(ensemble.auc, main = "Presence detection of Leopard in 2014-2020")
points(lp.pres.xy, pch=16)}

Threashold Maps aka binary map are also useful for the visual interpretation.

require(sf)
## Loading required package: sf
## Warning: package 'sf' was built under R version 4.4.2
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
ensemble.thres <- rf.thres.map + max.thres.map
hmt_bd <- st_read("C:/git/leopard_distribution/data/Boundary/Htamanthi_BD.shp")
## Reading layer `Htamanthi_BD' from data source 
##   `C:\git\leopard_distribution\data\Boundary\Htamanthi_BD.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 13 fields
## Geometry type: POLYGON
## Dimension:     XYZ
## Bounding box:  xmin: 95.27255 ymin: 25.06531 xmax: 95.93139 ymax: 25.73413
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84
#hmt_shp <- readOGR("C:/git/leopard_distribution/data/Boundary/Htamanthi_BD.shp")

par(mfrow = c(2,2))
plot(rf.thres.map, main = "Thresholds of Leopard Distribution by Random Forest")
plot(max.thres.map, main = "Thresholds of Leopard Distribution by MaxEnt")
plot(ensemble.thres, main = "Thresholds of Leopard Distribution \n Average weighted model by both GLM, Random Forest and MaxEnt")
{plot(ensemble.thres, main = "Presence detection of Leopard in 2018-2019")
points(lp.pres.xy, pch=16)}

Let’s make them pretty!!!

ensemble_spdf <- rasterToPoints((ensemble.thres$layer))
ensemble_df <- data.frame(ensemble_spdf)
head(ensemble_df)
##          x       y layer
## 1 95.24851 25.7564     0
## 2 95.24880 25.7564     0
## 3 95.24910 25.7564     0
## 4 95.24940 25.7564     0
## 5 95.24970 25.7564     0
## 6 95.25000 25.7564     0
head(lp.pres.xy)
##   longitude latitude
## 1    95.382   25.264
## 2    95.352   25.239
## 3    95.365   25.222
## 4    95.443   25.327
## 5    95.482   25.296
## 6    95.500   25.290
point_sf = st_as_sf(lp.pres, coords = c("longitude", "latitude"))
st_crs(point_sf) = crs(layers)
require(ggplot2)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.1
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
## 
##     margin
p4 <- ggplot() + 
  geom_raster(data = ensemble_df, mapping = aes(x = x, y = y, fill = layer)) + 
   scale_fill_gradient2(low = "transparent", mid = "darkblue", high = "darkgreen", midpoint = 1, name = "No. of models \npredicted") + 
  geom_sf(data = hmt_bd, colour = "black", fill = NA)+
  geom_sf(data = point_sf, color = "red")+
  xlab("Longitude") + 
  ylab("Latitude")+
  ggtitle("Leopard Distribution Prediction Thresholds \nby GLM, Random Forest and MaxEnt Models") + 
  coord_sf()

Prediction Model AUC Esamble

auc_ensemble_spdf <- rasterToPoints((ensemble.auc$layer))
auc_ensemble_df <- data.frame(auc_ensemble_spdf)
head(auc_ensemble_df)
##          x        y      layer
## 1 95.69640 25.74312 0.10727285
## 2 95.69670 25.74312 0.08162694
## 3 95.69700 25.74312 0.09581241
## 4 95.69729 25.74312 0.08859251
## 5 95.69759 25.74312 0.07789990
## 6 95.69789 25.74312 0.08778781
p3<- ggplot() + 
  geom_raster(data = auc_ensemble_df, mapping = aes(x = x, y = y, fill = layer)) + 
  scale_fill_viridis_c(name="Probability of \nLeopard \npresence")+
  geom_sf(data = hmt_bd, colour = "black", fill = NA)+
  geom_sf(data = point_sf, color = "red")+
  xlab("Longitude") + 
  ylab("Latitude")+
  ggtitle("Leopard Distribution Prediction AUC weighted \nby GLM, Random Forest and MaxEnt Models") + 
  coord_sf()

Prediction Random Forest

rf_spdf <- rasterToPoints((rf.map))
rf_df <- data.frame(rf_spdf)
head(rf_df)
##          x        y layer
## 1 95.69640 25.74312 0.144
## 2 95.69670 25.74312 0.090
## 3 95.69700 25.74312 0.158
## 4 95.69729 25.74312 0.128
## 5 95.69759 25.74312 0.084
## 6 95.69789 25.74312 0.102
p1 <- ggplot() + 
  geom_raster(data = rf_df, mapping = aes(x = x, y = y, fill = layer)) + 
  scale_fill_viridis_c(name="Probability of \nLeopard \npresence")+
  geom_sf(data = hmt_bd, colour = "black", fill = NA)+
  geom_sf(data = point_sf, color = "red")+
  xlab("Longitude") + 
  ylab("Latitude")+
  ggtitle("Leopard Distribution Prediction by Random Forest") + 
  coord_sf()

Prediction MaxEnt

max_spdf <- rasterToPoints((max.map))
max_df <- data.frame(max_spdf)
head(max_df)
##          x        y     layer
## 1 95.69640 25.74312 0.1611708
## 2 95.69670 25.74312 0.1434253
## 3 95.69700 25.74312 0.1168999
## 4 95.69729 25.74312 0.1260262
## 5 95.69759 25.74312 0.1395614
## 6 95.69789 25.74312 0.1507249
p2 <- ggplot() + 
  geom_raster(data = max_df, mapping = aes(x = x, y = y, fill = layer)) + 
  scale_fill_viridis_c(name="Probability of \nLeopard \npresence")+
  geom_sf(data = hmt_bd, colour = "black", fill = NA)+
  geom_sf(data = point_sf, color = "red")+
  xlab("Longitude") + 
  ylab("Latitude")+
  ggtitle("Leopard Distribution Prediction by MaxEnt") + 
  coord_sf()
p2

Prediction GLM

glm_spdf <- rasterToPoints((glm.map))
glm_df <- data.frame(glm_spdf)
head(glm_df)
##          x        y       layer
## 1 95.69640 25.74312 0.008702545
## 2 95.69670 25.74312 0.005954660
## 3 95.69700 25.74312 0.004313628
## 4 95.69729 25.74312 0.004746292
## 5 95.69759 25.74312 0.004890765
## 6 95.69789 25.74312 0.004428246
p6 <- ggplot() + 
  geom_raster(data = glm_df, mapping = aes(x = x, y = y, fill = layer)) + 
  scale_fill_viridis_c(name="Probability of \nLeopard \npresence")+
  geom_sf(data = hmt_bd, colour = "black", fill = NA)+
  geom_sf(data = point_sf, color = "red")+
  xlab("Longitude") + 
  ylab("Latitude")+
  ggtitle("Leopard Distribution Prediction by GLM") + 
  coord_sf()
p6

library(patchwork)
## Warning: package 'patchwork' was built under R version 4.4.2
## 
## Attaching package: 'patchwork'
## The following object is masked from 'package:raster':
## 
##     area
p6 + p1 + p2+p3+p4 + plot_layout(ncol = 2)

Step:6: Interpreting environmental relationships

For GLM

prediction.glm <- extract(glm.map, lp.val.xy)
glm.df <- data.frame(predict = prediction.glm, val.cov)

p.glm.elev <- ggplot(glm.df, aes(elev, predict)) +
  geom_point(alpha = 0.3, colour = "blue")+
  geom_smooth(span = 50)+
  ggtitle("a)") +
  ylim(-1, 2)+
  xlab("Elevation (m)")+
  ylab("Probability of Species Presence")

p.glm.slope <- ggplot(glm.df, aes(Slope, predict)) +
  geom_point(alpha = 0.3, colour = "blue")+
  geom_smooth(span = 50)+
  ggtitle("a)") +
  ylim(-1, 2)+
  xlab("Slope (Degree)")+
  ylab("Probability of Species Presence")

p.glm.ndvi <- ggplot(glm.df, aes(ndvi, predict)) +
  geom_point(alpha = 0.3, colour = "blue")+
  geom_smooth(span = 50)+
  ggtitle("a)") +
  ylim(-1, 2)+
  xlab("Elevation (m)")+
  ylab("Probability of Species Presence")

p.glm.canopy <- ggplot(glm.df, aes(canopy_h, predict)) +
  geom_point(alpha = 0.3, colour = "blue")+
  geom_smooth(span = 50)+
  ggtitle("a)") +
  ylim(-1, 2)+
  xlab("canopy height (m)")+
  ylab("Probability of Species Presence")

p.glm.d_road <- ggplot(glm.df, aes(dist_rd, predict)) +
  geom_point(alpha = 0.3, colour = "blue")+
  geom_smooth(span = 50)+
  ggtitle("a)") +
  ylim(-1, 2)+
  xlab("distance to road (m)")+
  ylab("Probability of Species Presence")

Thank you very much!